Probing the mutational landscape of the SARS-CoV-2 spike protein via quantum mechanical modeling of crystallographic structures

Abstract We employ a recently developed complexity-reduction quantum mechanical (QM-CR) approach, based on complexity reduction of density functional theory calculations, to characterize the interactions of the SARS-CoV-2 spike receptor binding domain (RBD) with ACE2 host receptors and antibodies. QM-CR operates via ab initio identification of individual amino acid residue’s contributions to chemical binding and leads to the identification of the impact of point mutations. Here, we especially focus on the E484K mutation of the viral spike protein. We find that spike residue 484 hinders the spike's binding to the human ACE2 receptor (hACE2). In contrast, the same residue is beneficial in binding to the bat receptor Rhinolophus macrotis ACE2 (macACE2). In agreement with empirical evidence, QM-CR shows that the E484K mutation allows the spike to evade categories of neutralizing antibodies like C121 and C144. The simulation also shows how the Delta variant spike binds more strongly to hACE2 compared to the original Wuhan strain, and predicts that a E484K mutation can further improve its binding. Broad agreement between the QM-CR predictions and experimental evidence supports the notion that ab initio modeling has now reached the maturity required to handle large intermolecular interactions central to biological processes.

. Interaction networks (RBD WT @ macACE2 to the left, RBD E484K @ macACE2 to the right) Bonds are purple when intermolecular or black when intramolecular, their width being related to the strength of the FBO between residues. Graph nodes are represented in red (repulsive) or blue (attractive) based on their effect to their counterpart and highlighted by yellow squares when at the binding interface. Fig S3. Mechanistic characterization of C144 binding to the Wuhan spike, and energetic changes as a result of the E484K spike mutation. Data are plotted on the spike primary structure (panel a) and on C144's (panel b), to compare the Wuhan spike (WT) and the mutated one (E484K). Amino acids are represented by the letters and numbered on the histogram's horizontal axis. Histograms underneath the sequences represent the relative change in binding energy of the E484K mutant relative to the Wuhan strain. The bar plots (bottom right) represent the chemical and electrostatic contributions in the binding of C144 to the WT and mutated spikes. Fig S3. (cont.). Interaction networks (RBD WT @ C144 to the left, RBD E484K @ C144 to the right) at the bottom represent the interface residues and their coordinated interactors: squares are spike residues and circles C144's, their respective coloring is red for repulsive and blue for attractive energy, a yellow highlight represents interface residues. Bonds are purple when intermolecular or black when intramolecular. Interface residues are represented by red (repulsive) or blue (attractive) squares based on their effect to their counterpart and highlighted by yellow squares when at the binding interface.

The interaction energies of relevant residues are marginally affected by the implicit solvent
In light of the importance implicit solvent may have in the dynamics of binding, especially whenever electrostatic interactions play a substantial role, we verify its impact in the system under study. Results show that introducing an implicit solvent does not affect predictions. Data are plotted on C121structure to compare the Wuhan spike (WT) and the mutated one (E484K) effect on binding. Amino acids are represented by the letters and numbered on the histogram's horizontal axis. Histograms underneath the sequences represent the relative change in binding energy of the E484K mutant relative to the Wuhan strain. The bar plots (bottom right) represent the chemical and electrostatic contributions in the binding of C121 to the WT and mutated spikes. The impact of implicit solvent is highlighted by the blue bars, and is shown to be marginal.

Model comparison with preexisting experimental datasets on randomly generated libraries of hACE2 and viral spike mutants
The binding energies provided by our model can be compared, under reasonable assumptions on the structures employed, to the quantities obtained from experimental databases of libraries of the interaction between spike RBD mutants and hACE2 mutants, all available in the literature [8,9]. While the comparison between these quantities is unconventional, as computational studies of protein-protein affinity involve in-depth analysis of structural and thermodynamic contributions [58], results from both datasets are in general accordance (Fig S6).
We decompose the binding energy of the RBD with the ligand into per-residue contributions to describe the relative importance of each amino acid in the binding process. To see how these data compare with available experimental measurements, we generate a dataset of virtual pointmutations for the RBD-hACE2 system and compare the interaction energies with experimental affinity data obtained from existing high throughput random mutation experiments, which are available in the literature both for the Wuhan spike protein [9] and hACE2 [8]. Data are presented in Fig S6. As already pointed out in the main text, this comparison is unconventional, as computational studies of protein-protein affinity always include in-depth analysis of structural and thermodynamic contributions [58]. However, our simulation largely aligns with experimental results [8,9,35,36]. The tested single point mutations do not lead to substantial structural rearrangement of the spike. Interestingly, the E484K mutation stands out as an outlier in both the experimental dataset and our simulation, exhibiting an overall improved binding to hACE2. However, most mutations of the RBD residues lead to decreased affinity, indicating that the viral Wuhan spike is overall well-adapted to bind hACE2.
One case where our model does not agree with experimental results is the case of mutation N501Y. We compared the structure we generated by point mutation along with the Beta crystal with the analysis of a molecular dynamics trajectory performed by Spinello et al. [44]. In the trajectory, conformational change, realized over 2 microseconds of simulation time, a hydrogen bond is formed between Tyr501 and Asp38. This feature is not present in the crystals we analyzed. The improvement in binding associated with mutation N501Y may indeed be less "RBD-Modular" than E484K, requiring significant conformational change in combination with other mutations to realize improved binding. To understand this mutation in detail, future studies based on our QM-CR approach should be combined with cutting edge molecular dynamics simulation.

Fig S6. Comparison of QM-CR simulations of virtual structures with available in vitro binding performance of mutant libraries.
Spike mutations are represented for their affinity binding score towards hACE2 [9]. Negative score means depletion of the affinity of mutated sequence. Virtual crystals have been generated in silico and the binding strength simulated using QM-CR. The percent difference in binding strength is then calculated with respect to Wuhan strain, negative value indicating improvement of the binding. If both the in vitro and the in-silico data represent a depletion or an enhancement, data are in qualitative agreement and colored with rainbow colors, from red to purple (error bars and uncertainties are not shown), and with copper colors otherwise. The in vitro and the in silico quantities are represented against each other (horizontal axis for the QM-CR binding, vertical axis for the affinity), by focusing on the top 10 GISAID [59] point-mutations (plus the L452R point-mutation) which are found in SARS-CoV-2 variants at the time of writing this contribution. The same color scheme is employed for the mutation represented in the sequence and for the plots. The two datasets have many points in accordance, more than 60% of the points being in qualitative agreement. Among all the tested mutations, E484K (purple point) emerges as the strongest binding performance in qualitative agreement with the experiment.

Details of the Fragmentation procedure
We here recall the main equations that define the energy decomposition schemes employed in the present work. We identify a system from the set of its atomic positions. A QM calculation of the system provides the density matrix [ ].
We associate the atomic positions of the system to a set of ionic (electronic) charge densities ( ) , respectively. The expression of the total energy reads: (1) The DFT Hamiltonian is defined as ̂= − 1 2 ∇ 2 + V + V + V from the combination of the electrostatic potential provided by = + , including the exchange and correlation term.
When post-processing a QM DFT calculation, it is in theory possible to perform an analysis on an arbitrarily defined set of fragments. The challenge is to define those fragments in a chemically meaningful way, or, at least, to quantify the pertinence of a given fragmentation. To augment the choice of fragmentation scheme, in a previous study we introduced a measure of fragment quality called the purity indicator [33]. This measure is directly based on the density, being computed as the deviation from idempotency of the density matrix block associated with a given fragment.
The interactions of a system may be described in the same framework by computing a quasiobservable called the "Fragment Bond Order" as a measure of the off-diagonal contributions of the density matrix. As described in [33], the Purity Indicator and Fragment Bond Order are computed directly from the density matrix. The Purity Indicator and Fragment Bond Order together measure the competition between a fragment's internal and external interactions. This approach has recently been applied to generating graph views of proteins as well as to understand the interaction between proteins and solvent molecules [60].
Let us now suppose that our system is separated into two regions, which we call ℱ and . We also assume that those two regions are associated with well-defined fragments, with associated fragment projection operators ̂ℱ , . We can then identify, for instance • Electrostatic, long-range term: it is defined by the electrostatic interaction between the two fragments in their polarized state, i.e., the ground state of the assembly.